All Questions
Tagged with scikit-learnpython
1,004 questions
2votes
0answers
9views
Low Accurecy from Geospatial Random forest ML modeling problem - Training Exported from qGIS, SCP
I am doing a geospatial assessment integrated with ML modeling. The problem is the very low accuracy percentage, as more training features increases, it gets lower. What could be he solution to such a ...
0votes
0answers
12views
Isolation Forest sample size
I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features. To prevent any overfitting, what would you recommend to tune ...
3votes
1answer
37views
Confirm understanding of decision_function in Isolation Forest sklearn
I am looking to better understand sklearn IsolationForest decision_function. My understanding is that if the metric is closer to -1 then the model is more confident ...
2votes
0answers
20views
Preprocessing multivalue attributes in a dataframe, similar to Nominal
Description: Input is a CSV file CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value For the multivalue columns. Minimum is 1, ...
2votes
1answer
44views
I can't get my R² above 70%
I tried RandomForest, LGBM, Knneighbors, Polynomial Regression as algorithm's and cross-validation, train test split and standard scaler, nothing seem's to get it past the 70% mark. The dataframe has ...
0votes
0answers
25views
Agglomerative clustering classifies 98% of my data in 1 cluster. Why?
I have a JSD distance matrix that I'm trying to cluster. When generating 24 clusters (roughly the amount the shows up on the clustermap), it assigns vast majority of the data as 1 cluster. Weirdly ...
1vote
0answers
40views
OneClassSVM super slow training with poly kernel
In contrast to questions like here, where a slow SVM training results from a high number of samples, I only have around 500 samples. Still, a single training fold (cross-validation) takes several ...
1vote
1answer
35views
scipy bootstrap generates input with inconsistent numbers of samples
I have a dataset of 77 samples, and I am using scipy bootstrap to get a confidence interval to estimate the precision. I am baffled to see that it generates input variables with inconsistent numbers ...
2votes
1answer
71views
Why lightgbm .predict function has proba not between 0 and 1
I wanna understand why in this code, I get the following results: ...
1vote
1answer
48views
Manual Python Implementation of Stacking Model
I tried to build a Python class, CustomStackingClassifier(), to implement the Stacking method in ensemble machine learning. In this implementation, the output of the base classifiers is set to be the ...
3votes
1answer
81views
Comparing clusterings from different datasets
I have 2 different data sets with essentially the same variables, though one is data from one year and the other is data from another year. I've run KModes on both data sets and now have some ...
2votes
2answers
228views
Fitting Rotated Curve
I'm trying to fit a rotated parabola with curve_fit, but it doesn't fit well as shown below: I'm already trying to fit the curve with respect to the cos(𝜃) and ...
0votes
0answers
37views
scikit-learn upgrade - how to fix breaking change?
i've inherated a solution that runs in databricks runtime 7.3, and it is using scikit-learn 0.21. Databricks runtime must be upgraded, and so existing scikit-learn version is not compatible with ...
0votes
1answer
79views
As an intermediate R programmer looking to dive into machine learning, should I choose Python or stick with R?
Background I am an intermediate R programmer with some experience in machine learning concepts and simple modeling in R. I have an opportunity to collaborate with a professional machine learning team ...
0votes
0answers
39views
Keep training pytorch model on new data
I'm working on a text classification task and have decided to use a PyTorch model for this purpose. The process mainly involves the following steps: Load and process the text. Use a TF-IDF Vectorizer....